Return To Home Search Feedback

August 95 How-To Columns


Windows NT

Take the Express Train To Defragmentation

by: John D. Ruley

Updates on Networking NT

John Ruley's Windows NT Q&A

Click Here to see a 66KB bitmap image of artwork which goes with this article, entitled:
Fragmentation Effect on NTFS

Click Here to see a 89KB bitmap image of artwork which goes with this article, entitled:
Look At Fragmentation

When deep dark first made an appearance in this column, I was complaining to him about the surprising lack of basic utilities for NT, such as undelete, defragmentation and antivirus software. This month, I'm delighted to report you can finally buy a defragmenter, and two antivirus applications are now in beta. (Undelete, unfortunately, is still missing).

I had originally planned to write a First Impression on Executive Software's Diskeeper (800-829-6468)--the first defragmenter for NT--but as I got deeper into testing, I found that the issues are too complex for a First Impression. In a nutshell, defragmentation on NT is very different from defragmentation on DOS or Windows.

At this point, many NT watchers will repeat what Microsofties have said more than once in places like CompuServe's WINNT forum: "Yes, that's true for DOS systems, but not for NT--at least not when running NT File System (NTFS)."

Wrong. NTFS does take some steps to minimize fragmentation. In particular, it sets a variable cluster size (up to 4KB) that depends on the size of the partition. That helps minimize external fragmentation, but it causes some internal fragmentation--that is, wasted space inside the file. NTFS allocates space only in clusters, and files can't share a cluster. So if you store an 8.5KB file and your cluster size is 4KB, NTFS has to allocate three full clusters--12KB--wasting most of the third cluster's space. This is known as internal fragmentation.

Yes, NTFS does get fragmented

I don't understand why some of Microsoft's NT people insist NTFS doesn't get fragmented. On the contrary, when I checked Microsoft's TechNet CD-ROM, I found references to excessive fragmentation rendering NTFS partitions unusable in NT 3.1 and unbootable in NT 3.5. Severe fragmentation of FAT (File Allocation Table) partitions makes it impossible to convert them to NTFS partitions, and Deep Dark tells me OS/2 HPFS (High Performance File System) partitions can get so badly fragmented that NT can become unusable.

Those are extreme cases. Our real-world concern about fragmentation is its effect on performance. Addressing this concern is difficult because the effect depends on the way you use a system.

Over the past few weeks, I've been experimenting with a test system--my trusty old NCR (today we'd say AT&T Global Information Solutions) 486/33 running NT Server 3.5 that I've deliberately set out to fragment. I've tested hard disk performance using version 2.0 of WINDOWS Magazine's NTHell benchmark, written by Martin Heller. This now includes a direct test of low-level (uncached) hard disk performance. The results are in the chart below.

The chart reveals that fragmentation has little effect on a large (4MB) file's uncached performance, but its effect on a cached 1MB file is impressive. That makes sense. In uncached low-level access, the hard disk is already so slow that the extra time spent slewing the hard disk heads to multiple fragments doesn't make much difference. But in cached access, where most data is read from or written to memory, the extra overhead for fragmentation becomes significant.

How significant? An unfragmented hard disk can be five times faster than one that's severely fragmented. To reach this conclusion, I ran a test program on an NTFS partition. I wrote files at random sizes from 0 to 32KB until I ran out of hard disk space. Then I deleted them and wrote more files. I kept this up for several hours.

If your hard disk is mostly full, as mine usually is, the situation will be at its worst. When you run out of hard disk space and delete small files to make room for a large one, you're fragmenting your disk.

Now you know two things: NT disks--even those formatted using NTFS--can be fragmented, and fragmentation impacts performance. The question is: What can you do about it?

These problems have been around since computers acquired hard disks, and lots of good defragmenters are available for DOS. If you're running NT with DOS-compatible FAT partitions, an easy solution is to boot DOS and run a DOS-based defragmenter. Be careful with NT 3.5 and later, though, because DOS defragmenters don't know how to deal with long filenames and can make a mess of your directory structure. Use a Windows 95-compatible defragmenter instead.

The trouble with this approach is that it requires a reboot, and many NT users (especially those running NT Server) don't run FAT partitions. What you need is a native NT defragmenter.

Fortunately, now there is one: Executive Software's Diskeeper for Windows. This utility is a full-blown native defragmenter that supports FAT and NTFS (but not HPFS) partitions, and it has a fascinating history.

Those who've followed NT's development know it's the brainchild of David Cutler and a group of programmers he brought to Microsoft from Digital Equipment Corp. Before building NT, Cutler had been responsible for the RT-11 and RSX-11M operating systems on Digital's PDP-11 series of minicomputers, and for the VMS operating system on VAX mainframes. Although it looks different, NT bears considerable similarity to those systems, and follows the same development philosophy. That philosophy is that disk fragmentation--within reason--is a good thing.

Cutler's group sees NT Server, like VMS on the VAX, as a multiuser operating system. The system is unavailable while you run a DOS-style defragmenter because it locks the entire hard disk while it moves data from fragmented files into contiguous free space. That's bad. Better to put up with the system slowing down a bit. Or, if performance is really impacted, make a backup of all data on the server, reformat the disk and start fresh.

Just as with NT users today, VMS users weren't thrilled to learn they could choose between slow performance and the laborious process of reformatting, followed by a full backup. Several companies wrote VMS defragmentation utilities, but Executive Software's Diskeeper was the most successful.

Diskeeper for NT

[A freeware version of this product is available on the Internet at http://www.execsoft.com]

What distinguishes Diskeeper from the DOS defragmenters you and I are familiar with (Norton SpeedDisk, for example) is this: Executive Software designed it for use on servers. For this reason, it doesn't normally run as a high-priority foreground process and doesn't lock up the entire disk. Instead, it runs as a low-priority background process (on NT, as a Service application), locking files one at a time as necessary. This behavior can take a little getting used to.

Diskeeper installs from a regular Windows-based NT setup program. It requires administrative privileges during the setup and has one unusual feature: It patches four of the NT system files, including the operating system kernel. This is necessary for Diskeeper to get the low-level file system information it needs (rumor has it Microsoft will include some of the patch code in future versions of NT). Because of this, you'll have to reboot after Diskeeper installs, and you'll have to synchronize any upgrades of NT and Diskeeper. Currently, Diskeeper doesn't support NT 3.51, for instance. But my contacts at Executive Software tell me they have Diskeeper running on NT 3.51 at this writing, and will ship a new version "almost immediately" after NT 3.51 ships.

At any rate, once installation is complete, reboot your computer, and you'll have a new program manager group containing a Diskeeper icon. Double-clicking on that icon doesn't launch the core Diskeeper code. As an NT service, it's launched automatically when needed. But it does bring up a graphical interface that lets you control Diskeeper.

The first thing you want to know is how badly fragmented your disk really is. Diskeeper gives you two ways to find out. The first is a quick graphical fragmentation monitor that gives you a continuous color-coded display illustrating how much of your disk is fragmented (see the sidebar below). A more detailed fragmentation analysis searches each and every file on the disk, counts which ones are fragmented and gives you an explicit statistical analysis. Having run both, I recommend the graphical monitor for quick checks (you need to take some action if more than half the disk is fragmented), and the full analysis for a more detailed look. I do wish the latter tool would let you specify a particular file of interest. Knowing how fragmented the pagefile is or whether an important database file is fragmented may be more important to you than knowing the average number of fragments per file.

Now, you can run Diskeeper in a single-pass mode that acts pretty much like a DOS-based defragmenter. Select Defragment/Single Pass, then go to lunch. That's probably what you'll want to do the first time, so you can see how much things improved. But it's not the ideal way to run Diskeeper, even on NT Workstations.

Why not? Because as mentioned above, Executive Software designed Diskeeper as a background process. Therefore, it doesn't lock down the whole disk for a single-pass optimization, and it rarely (if ever) achieves its best results on one pass, as you can see from the figure above. A single pass improves matters, especially if the disk is severely fragmented. To get the full benefit, though, run Diskeeper more than once (overnight, for instance).

Fortunately, that's easy to do. Select Defragment/Set It and Forget It, and you're presented with a dialog that lets you pick the partition you want to defragment (as with the single-pass option). However, you can now control when defragmentation occurs with the Edit/Run Schedule command.

This gives you quite a bit of control over when and how Diskeeper runs. On my machine, I set it to run continuously (as many times as possible) between midnight and 1 a.m. on weekdays. The result, when I log onto the system, is a thoroughly defragmented disk.

Diskeeper runs as a background task with low priority, so any other task in the system will interrupt it. That's good. Letting it run in the background has no effect on performance (I ran repeated NTHell and application tests to verify this). And if your system is never idle, defragmentation never occurs, which can be a problem on busy servers.

Diskeeper also needs exclusive access to files. If you're running a database, such as Microsoft's SQL Server 4.21 for Windows NT or Oracle CARD, you'll need to stop it for Diskeeper to defragment the database files. And because exclusive access to the NT pagefile is never available while NT is running, Diskeeper can't directly defragment it. You can defragment the rest of your system, create a new page, which will be unfragmented, then delete the old one.

On the whole, I'm quite impressed with Diskeeper. It does a good job with a minimum of fuss. But it may be overkill for some NT Workstation users. And there's room for a DOS-style offline defragmenter that would run automatically during NT's boot-up process. In the meantime, it's good to have a solution.

Antivirus for NT

As with disk defragmentation, NT antivirus software has been a sore point. Microsoft responds to questions about viruses by repeating the mantra "don't log in with administrative access." That's not

really much help. It limits the ability of an NT-based virus (I've never heard of one, but you never know!) to access the hardware, but does nothing about a DOS-based boot-sector virus installed during a dual-boot. What about viruses that attack client systems? An NT server may be safe from attack, but it could still be a carrier.

I'm delighted to report two companies are now rising to the challenge with NT-based antivirus applications. Carmel Software (011-972-4-416976), based in Haifa, Israel, has a basic antivirus application in beta that you can download from Beverly Hills Software (http: //www.bhs.com/

application.center/). This, by the way, is an outstanding place to look for NT shareware and demos. I've also heard that Cheyenne Software (800-243-9462) is about to ship an NT version of its InocuLAN antivirus product. Both applications use standard techniques, searching files for a known bit pattern that would indicate the presence of a virus.

Now, if someone will just develop an undelete for NT ...

Book o' the month

This month's book is Fragmentation: the Condition, the Cause and the Cure by Craig Jenson (1994, Executive Software International). Jenson is president of Executive Software, and the book is his clear, concise explanation of what fragmentation is and what Diskeeper does about it. The one problem is the book focuses on the VMS operating system rather than NT, so some of the specifics are different. Still, if you want to understand fragmentation in detail, you can do a lot worse.

John D. Ruley is WINDOWS Magazine's editor-at-large and resident Windows NT advocate. The second edition of his book Networking Windows NT (Wiley, 1993) is due out by the time you read this. To find his E-Mail ID Click Here


Programming Windows

Pick a Database That Runs on its Records

by: Martin Heller

The two database experts blinked. I'd asked them how I'd choose among Microsoft's database offerings.

"Well," one said, "I can't give you a general rule of thumb, but if you give me a concrete example, it'll be obvious."

"Okay here's a concrete example: You're in the hotel business and you want to write a central reservations system. It has to support distributed databases. The tables potentially have millions of records and a record has to come up very quickly.

"If a guest calls the main toll-free number and says, `Hi, this is John Smith from Omaha, and I'd like a room for next Monday,' the reservations clerk should answer, `So you're coming to Dubuque again, Mr. Smith? I can reserve your preferred nonsmoking room with a king-size bed overlooking the pool. Would you like to guarantee that for late arrival with your American Express card?' "

"How would you do that?"

"I'd do it with SQL Server on the back end and Visual Basic on the front end."

I smiled broadly. "That's exactly what I told my client at the time. Of course, since then, the client decided to use Delphi and Oracle instead.

"Okay, now try this one: You run a hospice, and your records system needs to track the dying patients, their doctors, their case workers and their families."

Microsoft's database guru started to look uncomfortable.

"It's all right. People do die, and a hospice makes the process more bearable for the patient and the bereaved. I do some pro bono consulting for them and make some bereavement calls. One problem this hospice has is generating all the paperwork required after a death--the sympathy cards, the mailing labels, the forms for volunteers who work with the bereaved. Right now the staff types everything over and over for the different forms. The database schema is pretty simple, and the biggest table runs to thousands of records--tens of thousands, tops."

The database guru gave me a big smile. "That's easy. I'd do that with Access."

"But what about Visual FoxPro?"

Visual FoxPro (800-426-9400) was what this meeting was ostensibly about.

"Well, sure, Fox could handle it, but Access would be a lot easier, especially if they themselves wanted to modify the forms later. They probably don't have too many technical people (I smiled thinly), but even the clerks should be able to add a field to an Access form using its visual design tools. Fox might be too complicated for them to maintain. If they needed hundreds of thousands of records, Fox would win because it scales better, but for thousands of records Access is just fine."

Eventually, we did get around to looking at Microsoft's Visual FoxPro, which I found impressive and a compelling choice for xBase developers who want to migrate to a visual, object-oriented environment. But either I'm easily impressed these days or database development products are getting very good indeed. I find Clarion for Windows (TopSpeed Corp., 800-354-5444) and Delphi for Windows (Borland International, 800-453-3375, x1309) equally impressive for various reasons. But I'll have to defer any further discussion of Visual FoxPro until I actually have a copy in my office.

First there was Turbo Pascal

Clarion and Delphi are both spiritual descendants of Turbo Pascal. Those of you up on Borland trivia will recall that a 14-year-old Dane named Nils Jensen wrote the original Turbo Pascal product and Philippe Kahn marketed it successfully for $99. Borland rose from there, diversified, acquired, grew--and faltered.

At one crucial point, as the story goes, Nils and his key people had their own C compiler under development when Philippe bought Wizard C instead and renamed it Turbo C. Nils left, formed Jensen & Partners and came out with the TopSpeed compiler family. Eventually, Jensen & Partners (which had some great compiler and linker technology, but no database technology and not much marketing clout) merged with Clarion Corp. (which had some great database development technology and a decent installed base, but nothing worthwhile as far as compilers). The resulting TopSpeed Corp. seemed to go quiet for more than a year and then surfaced with Clarion for Windows.

Meanwhile, back in Scotts Valley, Turbo Pascal went through revision after revision, picking up more depth with each iteration, moving nicely to Windows and adding object orientation. Then suddenly there were mutterings about this new Delphi product that was being called a VB-killer, a PowerBuilder-killer and a visual Pascal with reusable objects and integrated databases.

You've heard about Delphi by now. I've mentioned it before, and most of the magazines and journals for Windows programmers have written it up. It's a fine product, with in-memory compilation, two-way visual tools, a single-user version of InterBase Server, ReportSmith and all the Borland design and debugging tools, classes and samples. It requires a CD-ROM drive and at least 30MB of free disk space--or closer to 90MB if you want to do a full install. People keep writing me to ask how I like Delphi. I like it fine, except for the amount of disk space it hogs.

Nobody writes to ask me how I like Clarion, and I think that's a crying shame. It may not have all the polish and features of Delphi, but it's the most efficient database development package I've ever used--not to mention that it installs from five diskettes. Despite the Microsoft guru's recommendation of Access, I've decided to build the applications the hospice needs with Clarion. I haven't had much time to work on them yet, more's the pity, but when I do I'll let you know how they turn out.

Clarion feels right

I have begun to learn the Clarion environment. It's a bit different, but for a database application it feels right. If you can design a data dictionary and pick the right templates, you can have a working, compiled database application faster than you could ever have imagined. Clarion's Quick Start feature gets you from the data dictionary to a vanilla database application in minutes. And templates let you add standard functionality in hours rather than days.

So, what's a template? If you know what templates are in C++ you already have a pretty good idea. A template is a prewritten procedure skeleton that allows for parameter substitution. In Clarion, the parameters are things like tables, data fields, data validation ranges and referential integrity checks. In other words, if you want to implement a browse procedure in Clarion, you won't have to write any code. Just place a few templates and customize their properties, in much the same way as you'd place bound controls on a VB form. The major difference is that the Clarion template does a lot of stuff you'd have to hand-code in a VB application.

I'm a language junkie from way back. Clarion has its own fourth-generation language, which isn't exactly like Pascal or C or xBase. I haven't bothered to learn it. I haven't even needed to look at the generated source code. I just treat the entire Clarion development system as a black box that turns my design into an application.

I haven't gotten to the point of stressing the system yet. I haven't found out how well Clarion handles reading and writing locks in multi-user applications or how well it imports data in odd formats with entry errors or any of the other things that make or break a database system in real life. Stay tuned.

Speaking of tuning

Something interesting came up when I updated NTHELL, my 32-bit benchmarks, to add some of the functionality from Wintune, WINDOWS Magazine's 16-bit benchmarks. This functionality includes a memory-to-memory copy speed test. Interesting, as in "Don't let this happen to you!"

When we run the 16-bit Wintune tests on Pentium machines, we typically see memory copy speeds on the order of 20MB per second. But when I built the 32-bit benchmark with VC++ 2.1, I got Pentium memory copy speeds on the order of 10MB per second. I'd expect the 32-bit results to be twice the 16-bit results for this operation, but I didn't know what to make of it, so I quietly passed the program on to John Ruley, WinMag's editor-at-large.

John got really upset. That's okay, John does get upset, but good things usually come of it. He trotted out his own console-mode memory speed test, compiled it from the command line and demonstrated results on the order of 40MB per second. Then he recompiled for release from the VC++ environment and got results on the order of 10MB per second.

In other words, he managed to get the debug version to run four times faster than the optimized version.

Something was definitely fishy. My top two candidates were an inefficient intrinsic version of memcpy replacing an efficient memcpy function, or possibly an unrolled inline version of the loop around the memcpy call somehow defeating the memory caching. So I added some pragma lines to my code, and immediately got my factor of four back:

pragma function(memcpy)
pragma auto_inline(off)
void CopyTheMemory(void)
{
unsigned long register i;
for(i=0;i@mem_reps;i++)
memcpy(the_mem[0],
the_mem[1],
(unsigned int)mem_size);
}
pragma auto_inline(on)

I knew one of my guesses was correct. But which one? I could have tried commenting out one pragma line at a time, but that wouldn't have been any fun. Instead, I generated assembly language listings for John's code, which is short.

In the debug version, I found, as expected, an actual call_memcpy instruction. In the release version, I found instead a rep movsb instruction. Friends, the b suffix means the copy is done byte by byte. I knew from disassembling the 16-bit memcpy function that it does word-@word copying when it can, and I expected the 32-bit memcpy function would copy doubleword by doubleword. To nail it down, I put a breakpoint on the call to memcpy and traced into the function in a disassembly window. Sure enough, I quickly came to a rep moved instruction. There was my factor of four, neat and clean.

So, when is an optimization not an optimization? When somebody gets lazy about the code generated in the intrinsic version of a function. If you use Microsoft VC++ 2.x and call memcpy, I strongly recommend you add pragma function (memcpy) to your header code.

I wonder what other surprises are in store for me?

Martin Heller consults for a variety of businesses, illuminates dark code and writes in Andover, Mass.To find his E-Mail ID Click Here


Copyright ⌐ 1995 CMP Media Inc.